Overview

Dataset statistics

Number of variables15
Number of observations921
Missing cells991
Missing cells (%)7.2%
Duplicate rows2
Duplicate rows (%)0.2%
Total size in memory108.1 KiB
Average record size in memory120.1 B

Variable types

Numeric6
Categorical9

Alerts

Dataset has 2 (0.2%) duplicate rowsDuplicates
oldpeak is highly correlated with slope and 2 other fieldsHigh correlation
slope is highly correlated with cp and 3 other fieldsHigh correlation
ca is highly correlated with restecg and 4 other fieldsHigh correlation
thal is highly correlated with cp and 4 other fieldsHigh correlation
dataset is highly correlated with chol and 2 other fieldsHigh correlation
cp is highly correlated with slope and 1 other fieldsHigh correlation
trestbps is highly correlated with exangHigh correlation
chol is highly correlated with datasetHigh correlation
restecg is highly correlated with ca and 1 other fieldsHigh correlation
thalach is highly correlated with exangHigh correlation
exang is highly correlated with trestbps and 1 other fieldsHigh correlation
num is highly correlated with oldpeakHigh correlation
trestbps has 59 (6.4%) missing values Missing
fbs has 83 (9.0%) missing values Missing
thalach has 55 (6.0%) missing values Missing
exang has 55 (6.0%) missing values Missing
oldpeak has 63 (6.8%) missing values Missing
slope has 120 (13.0%) missing values Missing
ca has 321 (34.9%) missing values Missing
thal has 221 (24.0%) missing values Missing
chol has 172 (18.7%) zeros Zeros
oldpeak has 370 (40.2%) zeros Zeros
ca has 181 (19.7%) zeros Zeros

Reproduction

Analysis started2022-10-17 20:05:34.848158
Analysis finished2022-10-17 20:05:38.376127
Duration3.53 seconds
Software versionpandas-profiling v3.3.0
Download configurationconfig.json

Variables

age
Real number (ℝ≥0)

Distinct50
Distinct (%)5.4%
Missing1
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean53.51086957
Minimum28
Maximum77
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.3 KiB
2022-10-17T22:05:38.425964image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum28
5-th percentile37
Q147
median54
Q360
95-th percentile68
Maximum77
Range49
Interquartile range (IQR)13

Descriptive statistics

Standard deviation9.42468521
Coefficient of variation (CV)0.1761265568
Kurtosis-0.3829298183
Mean53.51086957
Median Absolute Deviation (MAD)6.5
Skewness-0.1959938616
Sum49230
Variance88.8246913
MonotonicityNot monotonic
2022-10-17T22:05:38.510056image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5451
 
5.5%
5843
 
4.7%
5541
 
4.5%
5638
 
4.1%
5738
 
4.1%
5236
 
3.9%
6235
 
3.8%
5135
 
3.8%
5935
 
3.8%
5333
 
3.6%
Other values (40)535
58.1%
ValueCountFrequency (%)
281
 
0.1%
293
 
0.3%
301
 
0.1%
312
 
0.2%
325
0.5%
332
 
0.2%
347
0.8%
3511
1.2%
366
0.7%
3711
1.2%
ValueCountFrequency (%)
772
 
0.2%
762
 
0.2%
753
 
0.3%
747
0.8%
731
 
0.1%
724
 
0.4%
715
 
0.5%
707
0.8%
6913
1.4%
6810
1.1%

sex
Categorical

Distinct2
Distinct (%)0.2%
Missing1
Missing (%)0.1%
Memory size7.3 KiB
1.0
726 
0.0
194 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2760
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row0.0

Common Values

ValueCountFrequency (%)
1.0726
78.8%
0.0194
 
21.1%
(Missing)1
 
0.1%

Length

2022-10-17T22:05:38.580188image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-17T22:05:38.637492image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
1.0726
78.9%
0.0194
 
21.1%

Most occurring characters

ValueCountFrequency (%)
01114
40.4%
.920
33.3%
1726
26.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1840
66.7%
Other Punctuation920
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01114
60.5%
1726
39.5%
Other Punctuation
ValueCountFrequency (%)
.920
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2760
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01114
40.4%
.920
33.3%
1726
26.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII2760
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01114
40.4%
.920
33.3%
1726
26.3%

cp
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)0.4%
Missing1
Missing (%)0.1%
Memory size7.3 KiB
4.0
496 
3.0
204 
2.0
174 
1.0
 
46

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2760
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row4.0
3rd row4.0
4th row3.0
5th row2.0

Common Values

ValueCountFrequency (%)
4.0496
53.9%
3.0204
22.1%
2.0174
 
18.9%
1.046
 
5.0%
(Missing)1
 
0.1%

Length

2022-10-17T22:05:38.685457image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-17T22:05:38.743838image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
4.0496
53.9%
3.0204
22.2%
2.0174
 
18.9%
1.046
 
5.0%

Most occurring characters

ValueCountFrequency (%)
.920
33.3%
0920
33.3%
4496
18.0%
3204
 
7.4%
2174
 
6.3%
146
 
1.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1840
66.7%
Other Punctuation920
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0920
50.0%
4496
27.0%
3204
 
11.1%
2174
 
9.5%
146
 
2.5%
Other Punctuation
ValueCountFrequency (%)
.920
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2760
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
.920
33.3%
0920
33.3%
4496
18.0%
3204
 
7.4%
2174
 
6.3%
146
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII2760
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.920
33.3%
0920
33.3%
4496
18.0%
3204
 
7.4%
2174
 
6.3%
146
 
1.7%

trestbps
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct62
Distinct (%)7.2%
Missing59
Missing (%)6.4%
Infinite0
Infinite (%)0.0%
Mean131.9686775
Minimum-9
Maximum200
Zeros1
Zeros (%)0.1%
Negative1
Negative (%)0.1%
Memory size7.3 KiB
2022-10-17T22:05:38.811253image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum-9
5-th percentile105
Q1120
median130
Q3140
95-th percentile160
Maximum200
Range209
Interquartile range (IQR)20

Descriptive statistics

Standard deviation19.65197056
Coefficient of variation (CV)0.1489139009
Kurtosis5.37538339
Mean131.9686775
Median Absolute Deviation (MAD)10
Skewness-0.2115463257
Sum113757
Variance386.1999469
MonotonicityNot monotonic
2022-10-17T22:05:38.890133image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
120131
14.2%
130115
12.5%
140102
 
11.1%
11059
 
6.4%
15056
 
6.1%
16050
 
5.4%
12529
 
3.1%
11519
 
2.1%
13518
 
2.0%
12817
 
1.8%
Other values (52)266
28.9%
(Missing)59
 
6.4%
ValueCountFrequency (%)
-91
 
0.1%
01
 
0.1%
801
 
0.1%
921
 
0.1%
942
 
0.2%
956
 
0.7%
961
 
0.1%
981
 
0.1%
10015
1.6%
1011
 
0.1%
ValueCountFrequency (%)
2004
 
0.4%
1921
 
0.1%
1902
 
0.2%
1851
 
0.1%
18012
1.3%
1783
 
0.3%
1741
 
0.1%
1722
 
0.2%
17014
1.5%
1652
 
0.2%

chol
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct218
Distinct (%)23.9%
Missing8
Missing (%)0.9%
Infinite0
Infinite (%)0.0%
Mean193.8871851
Minimum-9
Maximum603
Zeros172
Zeros (%)18.7%
Negative23
Negative (%)2.5%
Memory size7.3 KiB
2022-10-17T22:05:38.971220image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum-9
5-th percentile0
Q1167
median222
Q3267
95-th percentile331.8
Maximum603
Range612
Interquartile range (IQR)100

Descriptive statistics

Standard deviation114.1394835
Coefficient of variation (CV)0.5886901881
Kurtosis-0.1884844527
Mean193.8871851
Median Absolute Deviation (MAD)48
Skewness-0.5620771833
Sum177019
Variance13027.82169
MonotonicityNot monotonic
2022-10-17T22:05:39.049352image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0172
 
18.7%
-923
 
2.5%
22010
 
1.1%
25410
 
1.1%
2119
 
1.0%
2309
 
1.0%
2049
 
1.0%
2169
 
1.0%
2199
 
1.0%
2239
 
1.0%
Other values (208)644
69.9%
ValueCountFrequency (%)
-923
 
2.5%
0172
18.7%
851
 
0.1%
1002
 
0.2%
1171
 
0.1%
1261
 
0.1%
1291
 
0.1%
1311
 
0.1%
1321
 
0.1%
1391
 
0.1%
ValueCountFrequency (%)
6031
0.1%
5641
0.1%
5291
0.1%
5181
0.1%
4911
0.1%
4681
0.1%
4661
0.1%
4581
0.1%
4171
0.1%
4121
0.1%

fbs
Categorical

MISSING

Distinct3
Distinct (%)0.4%
Missing83
Missing (%)9.0%
Memory size7.3 KiB
0.0
692 
1.0
138 
-9.0
 
8

Length

Max length4
Median length3
Mean length3.009546539
Min length3

Characters and Unicode

Total characters2522
Distinct characters5
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0692
75.1%
1.0138
 
15.0%
-9.08
 
0.9%
(Missing)83
 
9.0%

Length

2022-10-17T22:05:39.125733image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-17T22:05:39.192491image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0692
82.6%
1.0138
 
16.5%
9.08
 
1.0%

Most occurring characters

ValueCountFrequency (%)
01530
60.7%
.838
33.2%
1138
 
5.5%
-8
 
0.3%
98
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1676
66.5%
Other Punctuation838
33.2%
Dash Punctuation8
 
0.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01530
91.3%
1138
 
8.2%
98
 
0.5%
Other Punctuation
ValueCountFrequency (%)
.838
100.0%
Dash Punctuation
ValueCountFrequency (%)
-8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2522
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01530
60.7%
.838
33.2%
1138
 
5.5%
-8
 
0.3%
98
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII2522
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01530
60.7%
.838
33.2%
1138
 
5.5%
-8
 
0.3%
98
 
0.3%

restecg
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)0.4%
Missing2
Missing (%)0.2%
Memory size7.3 KiB
0.0
551 
2.0
188 
1.0
179 
-9.0
 
1

Length

Max length4
Median length3
Mean length3.001088139
Min length3

Characters and Unicode

Total characters2758
Distinct characters6
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row2.0
2nd row2.0
3rd row2.0
4th row0.0
5th row2.0

Common Values

ValueCountFrequency (%)
0.0551
59.8%
2.0188
 
20.4%
1.0179
 
19.4%
-9.01
 
0.1%
(Missing)2
 
0.2%

Length

2022-10-17T22:05:39.246778image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-17T22:05:39.304785image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0551
60.0%
2.0188
 
20.5%
1.0179
 
19.5%
9.01
 
0.1%

Most occurring characters

ValueCountFrequency (%)
01470
53.3%
.919
33.3%
2188
 
6.8%
1179
 
6.5%
-1
 
< 0.1%
91
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1838
66.6%
Other Punctuation919
33.3%
Dash Punctuation1
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01470
80.0%
2188
 
10.2%
1179
 
9.7%
91
 
0.1%
Other Punctuation
ValueCountFrequency (%)
.919
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2758
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01470
53.3%
.919
33.3%
2188
 
6.8%
1179
 
6.5%
-1
 
< 0.1%
91
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII2758
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01470
53.3%
.919
33.3%
2188
 
6.8%
1179
 
6.5%
-1
 
< 0.1%
91
 
< 0.1%

thalach
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct120
Distinct (%)13.9%
Missing55
Missing (%)6.0%
Infinite0
Infinite (%)0.0%
Mean137.3764434
Minimum-9
Maximum202
Zeros0
Zeros (%)0.0%
Negative1
Negative (%)0.1%
Memory size7.3 KiB
2022-10-17T22:05:39.368189image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum-9
5-th percentile94.25
Q1120
median140
Q3157
95-th percentile178
Maximum202
Range211
Interquartile range (IQR)37

Descriptive statistics

Standard deviation26.38547681
Coefficient of variation (CV)0.1920669669
Kurtosis0.443605595
Mean137.3764434
Median Absolute Deviation (MAD)20
Skewness-0.3792931077
Sum118968
Variance696.1933866
MonotonicityNot monotonic
2022-10-17T22:05:39.442379image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
15043
 
4.7%
14041
 
4.5%
12035
 
3.8%
13030
 
3.3%
16026
 
2.8%
11021
 
2.3%
17020
 
2.2%
12520
 
2.2%
12216
 
1.7%
14514
 
1.5%
Other values (110)600
65.1%
(Missing)55
 
6.0%
ValueCountFrequency (%)
-91
0.1%
601
0.1%
631
0.1%
671
0.1%
691
0.1%
701
0.1%
711
0.1%
722
0.2%
731
0.1%
771
0.1%
ValueCountFrequency (%)
2021
 
0.1%
1951
 
0.1%
1941
 
0.1%
1921
 
0.1%
1902
0.2%
1882
0.2%
1871
 
0.1%
1862
0.2%
1854
0.4%
1844
0.4%

exang
Categorical

HIGH CORRELATION
MISSING

Distinct3
Distinct (%)0.3%
Missing55
Missing (%)6.0%
Memory size7.3 KiB
0.0
528 
1.0
337 
-9.0
 
1

Length

Max length4
Median length3
Mean length3.001154734
Min length3

Characters and Unicode

Total characters2599
Distinct characters5
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row0.0
2nd row1.0
3rd row1.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0528
57.3%
1.0337
36.6%
-9.01
 
0.1%
(Missing)55
 
6.0%

Length

2022-10-17T22:05:39.510027image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-17T22:05:39.571046image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0528
61.0%
1.0337
38.9%
9.01
 
0.1%

Most occurring characters

ValueCountFrequency (%)
01394
53.6%
.866
33.3%
1337
 
13.0%
-1
 
< 0.1%
91
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1732
66.6%
Other Punctuation866
33.3%
Dash Punctuation1
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01394
80.5%
1337
 
19.5%
91
 
0.1%
Other Punctuation
ValueCountFrequency (%)
.866
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2599
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01394
53.6%
.866
33.3%
1337
 
13.0%
-1
 
< 0.1%
91
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII2599
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01394
53.6%
.866
33.3%
1337
 
13.0%
-1
 
< 0.1%
91
 
< 0.1%

oldpeak
Real number (ℝ)

HIGH CORRELATION
MISSING
ZEROS

Distinct53
Distinct (%)6.2%
Missing63
Missing (%)6.8%
Infinite0
Infinite (%)0.0%
Mean0.8787878788
Minimum-2.6
Maximum6.2
Zeros370
Zeros (%)40.2%
Negative12
Negative (%)1.3%
Memory size7.3 KiB
2022-10-17T22:05:39.637120image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum-2.6
5-th percentile0
Q10
median0.5
Q31.5
95-th percentile3
Maximum6.2
Range8.8
Interquartile range (IQR)1.5

Descriptive statistics

Standard deviation1.091226248
Coefficient of variation (CV)1.241740214
Kurtosis1.127069239
Mean0.8787878788
Median Absolute Deviation (MAD)0.5
Skewness1.041426615
Sum754
Variance1.190774725
MonotonicityNot monotonic
2022-10-17T22:05:39.716374image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0370
40.2%
183
 
9.0%
276
 
8.3%
1.548
 
5.2%
328
 
3.0%
0.519
 
2.1%
1.217
 
1.8%
2.516
 
1.7%
0.815
 
1.6%
1.415
 
1.6%
Other values (43)171
18.6%
(Missing)63
 
6.8%
ValueCountFrequency (%)
-2.61
0.1%
-21
0.1%
-1.51
0.1%
-1.11
0.1%
-12
0.2%
-0.91
0.1%
-0.81
0.1%
-0.71
0.1%
-0.52
0.2%
-0.11
0.1%
ValueCountFrequency (%)
6.21
 
0.1%
5.61
 
0.1%
51
 
0.1%
4.41
 
0.1%
4.22
 
0.2%
48
0.9%
3.81
 
0.1%
3.71
 
0.1%
3.64
0.4%
3.52
 
0.2%

slope
Categorical

HIGH CORRELATION
MISSING

Distinct4
Distinct (%)0.5%
Missing120
Missing (%)13.0%
Memory size7.3 KiB
2.0
345 
1.0
203 
-9.0
190 
3.0
63 

Length

Max length4
Median length3
Mean length3.237203496
Min length3

Characters and Unicode

Total characters2593
Distinct characters7
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3.0
2nd row2.0
3rd row2.0
4th row3.0
5th row1.0

Common Values

ValueCountFrequency (%)
2.0345
37.5%
1.0203
22.0%
-9.0190
20.6%
3.063
 
6.8%
(Missing)120
 
13.0%

Length

2022-10-17T22:05:39.787642image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-17T22:05:39.855729image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
2.0345
43.1%
1.0203
25.3%
9.0190
23.7%
3.063
 
7.9%

Most occurring characters

ValueCountFrequency (%)
.801
30.9%
0801
30.9%
2345
13.3%
1203
 
7.8%
-190
 
7.3%
9190
 
7.3%
363
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1602
61.8%
Other Punctuation801
30.9%
Dash Punctuation190
 
7.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0801
50.0%
2345
21.5%
1203
 
12.7%
9190
 
11.9%
363
 
3.9%
Other Punctuation
ValueCountFrequency (%)
.801
100.0%
Dash Punctuation
ValueCountFrequency (%)
-190
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2593
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
.801
30.9%
0801
30.9%
2345
13.3%
1203
 
7.8%
-190
 
7.3%
9190
 
7.3%
363
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII2593
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.801
30.9%
0801
30.9%
2345
13.3%
1203
 
7.8%
-190
 
7.3%
9190
 
7.3%
363
 
2.4%

ca
Real number (ℝ)

HIGH CORRELATION
MISSING
ZEROS

Distinct6
Distinct (%)1.0%
Missing321
Missing (%)34.9%
Infinite0
Infinite (%)0.0%
Mean-3.986666667
Minimum-9
Maximum9
Zeros181
Zeros (%)19.7%
Negative290
Negative (%)31.5%
Memory size7.3 KiB
2022-10-17T22:05:39.907796image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum-9
5-th percentile-9
Q1-9
median0
Q30
95-th percentile2
Maximum9
Range18
Interquartile range (IQR)9

Descriptive statistics

Standard deviation4.910873992
Coefficient of variation (CV)-1.23182458
Kurtosis-1.857966283
Mean-3.986666667
Median Absolute Deviation (MAD)3
Skewness0.01476777637
Sum-2392
Variance24.11668336
MonotonicityNot monotonic
2022-10-17T22:05:39.956397image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
-9290
31.5%
0181
19.7%
167
 
7.3%
241
 
4.5%
320
 
2.2%
91
 
0.1%
(Missing)321
34.9%
ValueCountFrequency (%)
-9290
31.5%
0181
19.7%
167
 
7.3%
241
 
4.5%
320
 
2.2%
91
 
0.1%
ValueCountFrequency (%)
91
 
0.1%
320
 
2.2%
241
 
4.5%
167
 
7.3%
0181
19.7%
-9290
31.5%

thal
Categorical

HIGH CORRELATION
MISSING

Distinct4
Distinct (%)0.6%
Missing221
Missing (%)24.0%
Memory size7.3 KiB
-9.0
266 
3.0
196 
7.0
192 
6.0
46 

Length

Max length4
Median length3
Mean length3.38
Min length3

Characters and Unicode

Total characters2366
Distinct characters7
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row6.0
2nd row3.0
3rd row7.0
4th row3.0
5th row3.0

Common Values

ValueCountFrequency (%)
-9.0266
28.9%
3.0196
21.3%
7.0192
20.8%
6.046
 
5.0%
(Missing)221
24.0%

Length

2022-10-17T22:05:40.016434image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-17T22:05:40.082225image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
9.0266
38.0%
3.0196
28.0%
7.0192
27.4%
6.046
 
6.6%

Most occurring characters

ValueCountFrequency (%)
.700
29.6%
0700
29.6%
-266
 
11.2%
9266
 
11.2%
3196
 
8.3%
7192
 
8.1%
646
 
1.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1400
59.2%
Other Punctuation700
29.6%
Dash Punctuation266
 
11.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0700
50.0%
9266
 
19.0%
3196
 
14.0%
7192
 
13.7%
646
 
3.3%
Other Punctuation
ValueCountFrequency (%)
.700
100.0%
Dash Punctuation
ValueCountFrequency (%)
-266
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2366
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
.700
29.6%
0700
29.6%
-266
 
11.2%
9266
 
11.2%
3196
 
8.3%
7192
 
8.1%
646
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII2366
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.700
29.6%
0700
29.6%
-266
 
11.2%
9266
 
11.2%
3196
 
8.3%
7192
 
8.1%
646
 
1.9%

num
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)0.5%
Missing1
Missing (%)0.1%
Memory size7.3 KiB
0.0
411 
1.0
196 
2.0
135 
3.0
135 
4.0
43 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2760
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row2.0
3rd row1.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0411
44.6%
1.0196
21.3%
2.0135
 
14.7%
3.0135
 
14.7%
4.043
 
4.7%
(Missing)1
 
0.1%

Length

2022-10-17T22:05:40.138825image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-17T22:05:40.199360image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0411
44.7%
1.0196
21.3%
2.0135
 
14.7%
3.0135
 
14.7%
4.043
 
4.7%

Most occurring characters

ValueCountFrequency (%)
01331
48.2%
.920
33.3%
1196
 
7.1%
2135
 
4.9%
3135
 
4.9%
443
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1840
66.7%
Other Punctuation920
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01331
72.3%
1196
 
10.7%
2135
 
7.3%
3135
 
7.3%
443
 
2.3%
Other Punctuation
ValueCountFrequency (%)
.920
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2760
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01331
48.2%
.920
33.3%
1196
 
7.1%
2135
 
4.9%
3135
 
4.9%
443
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII2760
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01331
48.2%
.920
33.3%
1196
 
7.1%
2135
 
4.9%
3135
 
4.9%
443
 
1.6%

dataset
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.3 KiB
va
495 
cleveland
303 
switzerland
123 

Length

Max length11
Median length2
Mean length5.504885993
Min length2

Characters and Unicode

Total characters5070
Distinct characters13
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowcleveland
2nd rowcleveland
3rd rowcleveland
4th rowcleveland
5th rowcleveland

Common Values

ValueCountFrequency (%)
va495
53.7%
cleveland303
32.9%
switzerland123
 
13.4%

Length

2022-10-17T22:05:40.256792image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-17T22:05:40.314488image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
va495
53.7%
cleveland303
32.9%
switzerland123
 
13.4%

Most occurring characters

ValueCountFrequency (%)
a921
18.2%
v798
15.7%
l729
14.4%
e729
14.4%
n426
8.4%
d426
8.4%
c303
 
6.0%
s123
 
2.4%
w123
 
2.4%
i123
 
2.4%
Other values (3)369
7.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5070
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a921
18.2%
v798
15.7%
l729
14.4%
e729
14.4%
n426
8.4%
d426
8.4%
c303
 
6.0%
s123
 
2.4%
w123
 
2.4%
i123
 
2.4%
Other values (3)369
7.3%

Most occurring scripts

ValueCountFrequency (%)
Latin5070
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a921
18.2%
v798
15.7%
l729
14.4%
e729
14.4%
n426
8.4%
d426
8.4%
c303
 
6.0%
s123
 
2.4%
w123
 
2.4%
i123
 
2.4%
Other values (3)369
7.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII5070
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a921
18.2%
v798
15.7%
l729
14.4%
e729
14.4%
n426
8.4%
d426
8.4%
c303
 
6.0%
s123
 
2.4%
w123
 
2.4%
i123
 
2.4%
Other values (3)369
7.3%

Interactions

2022-10-17T22:05:37.466241image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:35.298960image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:35.703518image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:36.101439image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:36.492129image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:36.863049image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:37.528236image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:35.371188image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:35.769312image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:36.167738image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:36.552588image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:36.924257image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:37.589370image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:35.437855image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:35.835014image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:36.234664image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:36.612712image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:36.985642image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:37.654942image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:35.507293image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:35.906591image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:36.302289image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:36.677180image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:37.277680image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:37.716953image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:35.572739image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:35.971275image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:36.365972image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:36.738771image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:37.338717image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:37.785726image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:35.638335image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:36.035943image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:36.429033image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:36.800702image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:37.399366image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-10-17T22:05:40.369766image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-10-17T22:05:40.694410image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-10-17T22:05:40.797180image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-10-17T22:05:40.898574image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-10-17T22:05:40.986603image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-10-17T22:05:37.897721image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-10-17T22:05:38.047334image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-10-17T22:05:38.174936image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-10-17T22:05:38.323026image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

agesexcptrestbpscholfbsrestecgthalachexangoldpeakslopecathalnumdataset
063.01.01.0145.0233.01.02.0150.00.02.33.00.06.00.0cleveland
167.01.04.0160.0286.00.02.0108.01.01.52.03.03.02.0cleveland
267.01.04.0120.0229.00.02.0129.01.02.62.02.07.01.0cleveland
337.01.03.0130.0250.00.00.0187.00.03.53.00.03.00.0cleveland
441.00.02.0130.0204.00.02.0172.00.01.41.00.03.00.0cleveland
556.01.02.0120.0236.00.00.0178.00.00.81.00.03.00.0cleveland
662.00.04.0140.0268.00.02.0160.00.03.63.02.03.03.0cleveland
757.00.04.0120.0354.00.00.0163.01.00.61.00.03.00.0cleveland
863.01.04.0130.0254.00.02.0147.00.01.42.01.07.02.0cleveland
953.01.04.0140.0203.01.02.0155.01.03.13.00.07.01.0cleveland

Last rows

agesexcptrestbpscholfbsrestecgthalachexangoldpeakslopecathalnumdataset
91142.01.04.0140.0358.00.00.0170.00.00.0-9.0-9.0-9.00.0va
91251.00.03.0110.0190.00.00.0120.00.00.0-9.0-9.0-9.00.0va
91359.01.04.0140.0-9.00.00.0140.00.00.0-9.00.0-9.00.0va
91453.01.02.0120.0-9.00.00.0132.00.00.0-9.0-9.0-9.00.0va
91548.00.02.0-9.0308.00.01.0-9.0-9.02.01.0-9.0-9.00.0va
91636.01.02.0120.0166.00.00.0180.00.00.0-9.0-9.0-9.00.0va
91748.01.03.0110.0211.00.00.0138.00.00.0-9.0-9.06.00.0va
91847.00.02.0140.0257.00.00.0135.00.01.01.0-9.0-9.00.0va
91953.01.04.0130.0182.00.00.0148.00.00.0-9.0-9.0-9.00.0va
920NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNva

Duplicate rows

Most frequently occurring

agesexcptrestbpscholfbsrestecgthalachexangoldpeakslopecathalnumdataset# duplicates
049.00.02.0110.0-9.00.00.0160.00.00.0-9.0-9.0-9.00.0va2
158.01.03.0150.0219.00.01.0118.01.00.0NaNNaNNaN2.0va2